I ) Introduction

II ) Data

III ) Transfer Learning model

IV ) Conclusion

I) Introduction

Continuing after Project-1 where we did Flower Classification using the MobilenetV2, we now explore VGG16 and Xception models for the same dataset and compare the performance of these models.

A network model that has been saved and earlier trained using a sizable dataset is known as a pre-trained model.

The concept behind transfer learning for image classification is that if we use a model that was trained on a really big, representative dataset, this model can serve as our basic model for categorising images. This allows to save a lot of training time by utilizing the feature maps.

VGG16 - The VGG16 architecture consists of 16 layers, including 13 convolutional layers and 3 fully connected layers. The convolutional layers have small 3x3 filters and are followed by max-pooling layers. The number of filters in the convolutional layers increases as we move deeper into the network. The fully connected layers have 4096 neurons each and are followed by a softmax layer for classification. The model has more than 138 million parameters and is pre-trained on the ImageNet dataset, achieving top results in image classification and object detection tasks.

Xception - The Xception model consists of a total of 36 convolutional layers, which are organized into 14 modules. The first module is a standard convolutional layer followed by a max-pooling layer. The subsequent 13 modules follow a similar pattern of depthwise separable convolutional layers, where the depthwise convolution is followed by a pointwise convolution. The number of filters in the depthwise convolution increases in the middle layers and decreases in the final layers. The output of each module is then added to the previous output using a skip connection. The final layers include global average pooling, a fully connected layer, and a softmax layer for classification.

II) Data

2.1 - Load & explore data

Now, we load the different images and transform them into numpy arrays

Plotting some random flowers

2.2 - Label encoding

Labels are the 5 species number (from 0 to 4). Thus, we need to encode these labels to one-hot vectors. For instance, an image of a sunflower should have a label 3 and a corresponding y = [0,0,0,1,0].

2.3 - Split training and validation set

Here, we're gonna split our dataset into a training, a validation and a testing one. This ensures that there are no bias: the model is trained on images with known labels, then we test our model accuracy on the validation dataset on images that our model did not see before. Finally, we compute the accuracy on the test dataset.

III) Transfer Learning Models

3.1 About the optimizer and learning rate

When our model will be built, we need to specify an accuracy function, a loss function and an optimisation algorithm.

The accuracy function is used to evaluate the performance of the model.

The loss function is used to measure how the model performs on data with known labels. It tells us how poorly the model performs in a supersised system. For multi-label classification, we make use of a specific loss function called as categorical_crossentropy (similar to cross-entropy in maths).

Finally, the optimizer function is used in order to minize the loss function by changing model parameters (weighs values, filters kernel values etc.).

For this classification problem, we choose the RMSprop optimizer which is very efficient and commonly used (more details on the optimizers on Keras here).

Since deep networks can take quiet a time for the optimizer to converge, we're gonna use an annealing method of the learning rate (LR).

The LR is basically the step by which the optimizer is 'walking'. A hight LR correspond to big steps and thus the convergence is faster. However, in that case the sampling is not really efficient since the optimizer do not fall especially in the right minima.

At the opposite, have a low LR means that the optimizer will probably find the right local minima but it will take a lot of time.

The idea here is to start from a low value but not so low and then decrease the LR along the training to reach efficiently the global minimum of the loss function. Using the ReduceLROnPlateau method , we are able to choose to reduce the LR by a coefficient (here 75%) if the accuracy has not improved after a number of epochs (here 3).


In addition, we use the EarlyStopping method to control the training time: if the accuracy has not improved after 5 epochs we stop.

Finally we make use of the ModelCheckpoint which is useful for monitoring the best found weights during the training.

3.2 Define the model - VGG16

3.3 - Data augmentation

A useful trick to ovoid any overfitting is to use data augmentation. What is that? Well, the idea is to add artificially data into our dataset. But of course not any data, we alter the dataset with tiny transformations to reproduce very similar images.

For instance, we rotate of a few degree an image, we de-center it or we zoom in or out a little bit. These common augmentation techniques are horizontal/vertical flips, rotations, translations, rescaling, random crops, adjust brightness and more.

Thanks to these transformations, we can get bigger dataset (x2, x3 in size) and then train our model in a much robust way.

3.4 Feature extraction

3.5 Plotting Accuracy and Loss - VGG16

3.6 Define the model - Xception

3.7 Plotting Accuracy and Loss - Xception

Conclusion

In conclusion, we trained VGG16 and Xception models on a flower classification dataset and achieved high accuracy levels on both models. The VGG16 model achieved a test accuracy of 85%, while the Xception model achieved a slightly lower test accuracy of 84.26%. However, the Xception model had a higher validation accuracy of 87.50% compared to VGG16's 84.98%. Both models had similar training accuracies, with VGG16 at 85.18% and Xception at 84.90%. Based on these results, it can be concluded that both models performed well on the flower classification task, but the Xception model may have a slight advantage in terms of generalization performance. Further experimentation and evaluation can be done to explore the strengths and weaknesses of each model in more detail.